Add multi-period DiD support#6
Merged
Merged
Conversation
Extend the library to optionally work across multiple time periods: - Add MultiPeriodDiD estimator that handles multiple pre/post periods - Add MultiPeriodDiDResults for period-specific and average ATT - Add PeriodEffect dataclass for individual period treatment effects - Support for period dummies and treatment × period interactions - Compute average ATT with proper covariance-adjusted standard errors - Allow custom reference period selection - Auto-infer post-periods when not specified (last half of periods) - Full support for covariates, fixed effects, and absorbed FE - Cluster-robust and heteroskedasticity-robust standard errors - Comprehensive test suite with 28 new tests (70 total passing)
igerber
pushed a commit
that referenced
this pull request
Jan 4, 2026
igerber
added a commit
that referenced
this pull request
Apr 18, 2026
Addresses axis B findings #6 and #7 from the silent-failures audit: trop_global.py:448 outer alternating-min loop, trop_global.py:466 hard-coded range(20) inner FISTA loop, and trop_local.py:680 alternating-minimization loop all exited silently on max_iter exhaustion, returning the current iterate as if converged. - trop_global._solve_global_with_lowrank: thread a converged flag through the outer loop; count non-convergence events from the inner FISTA and surface the count in the outer warning for diagnostic context. One warn_if_not_converged call per solver invocation. - trop_local._estimate_model: thread a converged flag through the outer alternating-min loop; call warn_if_not_converged on exhaustion. - REGISTRY updated under TROP. New TestTROPConvergenceWarnings class (4 tests) exercises both global and local paths with forced non-convergence (max_iter=1, tol=1e-15) and a convergent negative control. Notable: the default TROP local config (max_iter=100, tol=1e-6) does not converge within max_iter on typical synthetic panels, so this PR surfaces a previously silent non-convergence that affected routine user fits. No numerical change in the returned iterate; the warning is additive. Axis-B regression-lint baseline: 5 -> 2 silent range(max_iter) loops remaining (minor loops in honest_did/power not yet addressed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 19, 2026
P0: bias_corrected_local_linear now routes the CI through `safe_inference()` so degenerate cases with `se_robust <= 0` or non-finite `se_robust` (e.g., exact-fit / constant-y) return `(NaN, NaN)` rather than a misleading zero-width or infinite CI. Matches the repo-wide inference contract (CLAUDE.md Key Design Pattern #6). P1: Auto-bandwidth path now calls `lpbwselect_mse_dpi` directly with `cluster`, `vce`, and `nnmatch` forwarded. Previously it went through `mse_optimal_bandwidth` which hard-codes unclustered / nn / nnmatch=3, silently mismatching the downstream `lprobust` fit's reported estimator. Tests added: TestNaNSafeCI (constant-y + near-zero-SE) and TestAutoBandwidthForwardsParameters (cluster+auto, vce='hc1'+auto, nnmatch=5+auto), all asserting the selected bandwidth changes when the corresponding parameter changes (catches silent fallback). Also: suppress spurious BLAS FPE warnings in lprobust_bw's hc/hc2/hc3 branch (numpy issue #21432 pattern), newly reachable via the wired-through vce='hc1' auto-bandwidth path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 22, 2026
Close BR/DR gap #6: target-parameter block in schemas
6 tasks
igerber
added a commit
that referenced
this pull request
Apr 26, 2026
Wave 3 #6+#7 of the dCDH by_path follow-up sequence (after PR #378 shipped #5 by_path + controls). Removes the two NotImplementedError gates at chaisemartin_dhaultfoeuille.py:1014-1023 and adds: - New `path_cumulated_event_study` Results field surfacing the cumulated level effect delta_l per path under trends_linear=True (mirrors the global linear_trends_effects cumulation; this is what R returns under did_multiplegt_dyn(..., by_path, trends_lin)). - `set_ids` parameter threaded through the four per-path IF helpers so trends_nonparam's set-restricted control pool reaches per-path analytical SE, bootstrap, placebos, and sup-t bands automatically. - to_dataframe(level="by_path") gains cumulated_effect / cumulated_se columns (always present, NaN-when-None — mirrors cband_*). - summary() renders a per-path "Cumulated Level Effects" sub-section. Validated against R DIDmultiplegtDYN 2.3.3 via two new golden scenarios: - single_baseline_multi_path_by_path_trends_lin (custom DGP: F_g >= 4, cohort-single-path, n_periods=13) — per-path cumulated point estimates match R bit-exactly (POINT_RTOL=1e-9), cumulated SE within ±20% - multi_path_reversible_by_path_trends_nonparam — per-path point estimates AND placebos match R bit-exactly, SE within ±15% Placebo parity for trends_linear is intentionally skipped: R's per-path placebo re-runs on the path-restricted subsample with different control eligibility than Python's global-then-disaggregate architecture, so the divergence is methodological, not a tolerance issue. Internal regression covers placebo + trends_linear (finite values, bootstrap inheritance). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 26, 2026
Wave 3 #6+#7 of the dCDH by_path follow-up sequence (after PR #378 shipped #5 by_path + controls). Removes the two NotImplementedError gates at chaisemartin_dhaultfoeuille.py:1014-1023 and adds: - New `path_cumulated_event_study` Results field surfacing the cumulated level effect delta_l per path under trends_linear=True (mirrors the global linear_trends_effects cumulation; this is what R returns under did_multiplegt_dyn(..., by_path, trends_lin)). - `set_ids` parameter threaded through the four per-path IF helpers so trends_nonparam's set-restricted control pool reaches per-path analytical SE, bootstrap, placebos, and sup-t bands automatically. - to_dataframe(level="by_path") gains cumulated_effect / cumulated_se columns (always present, NaN-when-None — mirrors cband_*). - summary() renders a per-path "Cumulated Level Effects" sub-section. Validated against R DIDmultiplegtDYN 2.3.3 via two new golden scenarios: - single_baseline_multi_path_by_path_trends_lin (custom DGP: F_g >= 4, cohort-single-path, n_periods=13) — per-path cumulated point estimates match R bit-exactly (POINT_RTOL=1e-9), cumulated SE within ±20% - multi_path_reversible_by_path_trends_nonparam — per-path point estimates AND placebos match R bit-exactly, SE within ±15% Placebo parity for trends_linear is intentionally skipped: R's per-path placebo re-runs on the path-restricted subsample with different control eligibility than Python's global-then-disaggregate architecture, so the divergence is methodological, not a tolerance issue. Internal regression covers placebo + trends_linear (finite values, bootstrap inheritance). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HanomicsIMF
pushed a commit
to HanomicsIMF/diff-diff
that referenced
this pull request
Apr 27, 2026
Closes BR/DR foundation gap igerber#6 from project_br_dr_foundation.md: BusinessReport and DiagnosticReport now name what the headline scalar actually represents as an estimand, for each of the 16 result classes. Baker et al. (2025) Step 2 ("define the target parameter") was previously in BR's next_steps list but not done by BR itself — this PR closes that gap. New top-level ``target_parameter`` block (additive schema change; experimental per REPORTING.md stability policy): { "name": str, # stakeholder-facing name "definition": str, # plain-English description "aggregation": str, # machine-readable dispatch tag "headline_attribute": str, # which raw result attribute "reference": str, # REGISTRY.md citation pointer } Schema placement: top-level block (user preference, selected via AskUserQuestion in planning). Aggregation tags include "simple", "event_study", "group", "2x2", "twfe", "iw", "stacked", "ddd", "staggered_ddd", "synthetic", "factor_model", "M", "l", "l_x", "l_fd", "l_x_fd", "dose_overall", "pt_all_combined", "pt_post_single_baseline", "unknown". Per-estimator dispatch lives in the new ``diff_diff/_reporting_helpers.py::describe_target_parameter`` (own module rather than business_report / diagnostic_report to avoid circular-import risk — plan-review LOW igerber#7). All 17 result classes covered (16 from _APPLICABILITY + BaconDecompositionResults); exhaustiveness locked in by TestTargetParameterCoversEveryResultClass. Fit-time config reads: - ``EfficientDiDResults.pt_assumption`` branches the aggregation tag between pt_all_combined and pt_post_single_baseline. - ``StackedDiDResults.clean_control`` varies the definition clause (never_treated / strict / not_yet_treated). - ``ChaisemartinDHaultfoeuilleResults.L_max`` + ``covariate_residuals`` + ``linear_trends_effects`` branches the dCDH estimand between DID_M / DID_l / DID^X_l / DID^{fd}_l / DID^{X,fd}_l. Fixed-tag branches (per plan-review CRITICAL igerber#1 and igerber#2): - ``CallawaySantAnna`` / ``ImputationDiD`` / ``TwoStageDiD`` / ``WooldridgeDiD``: the fit-time ``aggregate`` kwarg does not change the ``overall_att`` scalar — it only populates additional horizon / group tables on the result object. Disambiguating those tables in prose is tracked under gap igerber#9. - ``ContinuousDiDResults``: the PT-vs-SPT regime is a user-level assumption, not a library setting. Emits a single "dose_overall" tag with disjunctive definition naming both regime readings (ATT^loc under PT, ATT^glob under SPT). Prose rendering: - BR ``_render_summary``: emits "Target parameter: <name>." after the headline sentence (short name only; full definition lives in the full_report and schema). - BR ``_render_full_report``: "## Target Parameter" section between "## Headline" and "## Identifying Assumption". - DR ``_render_overall_interpretation``: mirror sentence. - DR ``_render_dr_full_report``: "## Target Parameter" section with name, definition, aggregation tag, headline attribute, and reference. Cross-surface parity: both BR and DR consume the same helper (the single source of truth), so their ``target_parameter`` blocks are byte-identical (verified by TestTargetParameterCrossSurfaceParity). Tests: 37 new (TestTargetParameterPerEstimator + TestTargetParameterFitConfigReads + TestTargetParameterCoversEveryResultClass + TestTargetParameterCrossSurfaceParity + TestTargetParameterProseRendering). Existing BR/DR top-level-key contract tests updated to include ``target_parameter``. Total 319 tests pass (282 prior + 37 new). Docs: REPORTING.md gains a "Target parameter" section documenting the per-estimator dispatch and schema shape. business_report.rst and diagnostic_report.rst note the new field with a pointer to REPORTING.md. CHANGELOG entry under Unreleased. Out of scope: REGISTRY.md per-estimator "Target parameter" sub-sections (plan-review additional-note); the reporting-layer doc in REPORTING.md is the current source of truth. A follow-up docs PR can land those sub-sections if maintainers want the registry to own the canonical wording directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 10, 2026
TDL77
pushed a commit
to TDL77/diff-diff
that referenced
this pull request
May 13, 2026
) The CI AI reviewer's quality has notably degraded since igerber#404 and igerber#415 landed. This restores 5 files to the snapshot at fe80295 (parent of igerber#404's merge, i.e. the last commit on main before either PR landed): .github/workflows/ai_pr_review.yml -- reinstates openai/codex-action@v1 with model: gpt-5.4 and effort: xhigh .github/codex/prompts/pr_review.md -- removes Single-Pass Completeness Mandate (igerber#404) and Audit igerber#6 "Claim-vs-shipped" + tightened verdict bar (igerber#415); restores the original 179-line prompt .claude/scripts/openai_review.py -- drops --ci-mode (igerber#415) and gpt-5.5 PRICING/reasoning entries (igerber#404); DEFAULT_MODEL back to gpt-5.4 .claude/commands/ai-review-local.md -- restores skill doc to match the restored script tests/test_openai_review.py -- restores test suite to match (152 tests pass at the snapshot) igerber#404's body documented the rollback criterion ("if >2 of next 5 PRs surface new P1+ findings on unchanged code in round 2, revert the model bump"). This invokes that rollback and extends it to also revert the Codex -> Python single-shot backend switch from igerber#415. Out of scope: open branch ci-workflow-ipynb-markdown-extraction (PR igerber#414, unmerged) also touches openai_review.py and pr_review.md; that branch will need to rebase against the older base after this lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
May 20, 2026
…ueue + example refs One informational P3 from CI codex R2 — METHODOLOGY_REVIEW.md still described ContinuousDiD as "In Progress" in two surrounding surfaces even after the status-table flip, creating conflicting status signals. Fixed both sites: 1. L27 explanatory paragraph: removed the ContinuousDiD example from the In Progress band's "has methodology file but no paper review" illustration (it's now Complete). 2. L1289-1292 Priority Order queue: removed entry #9 (ContinuousDiD) and renumbered the remaining queue. Retroactive fix per feedback_changelog_accuracy_fixes (CI review catching one factual error in the queue means scanning for the same mistake): PR #473 promoted HeterogeneousAdoptionDiD to Complete but left entry #6 (HAD) in the same In Progress queue. Removed HAD's entry too and renumbered, so the queue is now self-consistent with the status table for all Complete entries. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
May 22, 2026
Closes the WooldridgeDiD (ETWFE) methodology-review-tracker promotion in METHODOLOGY_REVIEW.md (In Progress → Complete), following the primary-source review for Wooldridge (2025) merged in PR-A (#484). Adds two paper-driven implementation surfaces and extends R-parity goldens to the nonlinear paths. Implementation: - `aggregate(weights="cohort_share")` on WooldridgeDiDResults implements paper Eqs. 7.4 (simple-overall) and 7.6 (event-time, restricted to k>=0) cohort-share aggregation weights as an opt-in alternative to the default cell-count weighting (matching Stata `jwdid_estat`). Inference fields fail-closed to NaN with UserWarning per paper Section 7.5 conditional-on-shares semantics; raises on `survey_design` (design-consistent totals deferred); raises on `type ∈ {"group","calendar"}` (no paper closed-form); raises on bootstrap fits (no matching bootstrap variant). Closes TODO row 95. - `cohort_trends=True` on `WooldridgeDiD.__init__` adds linear `dg_i · t` cohort-specific trend interactions (paper Section 8 / Eq. 8.1) for the OLS path. Rejects on logit/poisson per paper Section 8 OLS scope; rejects on survey_design pending full-dummy/TSL validation; enforces per-cohort pre-period identification check (≥ 2 observed pre-periods per treated cohort). Auto-routes to full-dummy mode regardless of vcov_type. Closes the PR-A Requirements Checklist heterogeneous-trends gap. Tests: - `tests/test_methodology_wooldridge.py` extended with 6 paper-equation-numbered methodology classes (Theorem 3.1, Proposition 5.1, Section 6 event study, Section 7 aggregation paths, Section 8 heterogeneous trends, Section 10 unbalanced panels) + `TestW2025LibraryDeviations` consolidating 5 surviving deviations. Mirrors the HAD PR #473 precedent. - Two new R-parity surface classes (`TestWooldridgeParityRPoisson`, `TestWooldridgeParityRLogit`) lock the structural surface against R `etwfe(family=...)` log-link goldens. - 209 tests total (60 methodology + 149 R-parity + unit regressions). R Goldens: - `benchmarks/R/generate_wooldridge_golden.R` extended with Poisson + logit DGPs via R `etwfe`; augmented panel CSV retains the same seed-generated `y_pois` + `y_logit` columns for cross-language reproducibility. - `benchmarks/R/requirements.R` pins `etwfe >= 0.5.0`. Tracker promotion: - METHODOLOGY_REVIEW.md L52 status flip with merge date; detail section L583-605 rewritten to the Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns template mirroring HAD / ContinuousDiD / DCDH. L27 example re-pointed; priority queue items #7-#10 renumbered to #6-#9. - REGISTRY.md `## WooldridgeDiD (ETWFE)` extended with `### Deviations from the paper / from R / library extensions` block consolidating 7 surviving deviations + opt-in notes for cohort_share + cohort_trends + survey rejection + bootstrap cohort_share rejection contracts. - CHANGELOG.md `[Unreleased]` `### Added` documents the new parameters, R-parity extension, and tracker flip. - `docs/methodology/papers/wooldridge-2025-review.md` Requirements Checklist + Gaps & Uncertainties items 1 + 11 marked `**Status:** Closed in PR-B`. - `docs/api/wooldridge_etwfe.rst` updated with weighting-scheme notes alongside the existing aggregation table. Second of two PRs for the WooldridgeDiD methodology-review-tracker promotion. PR-A merged at e416aed (#484). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HanomicsIMF
pushed a commit
to HanomicsIMF/diff-diff
that referenced
this pull request
May 22, 2026
…ment criteria The CI AI reviewer (openai/codex-action@v1) was missing P2 findings the local single-shot Responses API path consistently catches. Smoking gun on PR igerber#412 SHA 7f7e3d5: CI Codex returned a clean verdict with zero P2 findings; the single-shot path on the same SHA flagged 3 real P2 gaps (docstring drift, missing survey-composition tests, REGISTRY cross-doc inconsistency), all real, two of which the user shipped a fix for in the next commit. Three coordinated changes: - Replace the openai/codex-action@v1 step in ai_pr_review.yml with a Python step that invokes .claude/scripts/openai_review.py in a new --ci-mode. Reuses the proven single-shot architecture instead of a parallel pipeline that drops findings. Workflow renamed from "AI PR Review (Codex)" to "AI PR Review". Existing <!-- ai-pr-review:codex:* --> comment markers preserved for backward compat with historical PR canonical comments. - Add a directive Audit igerber#6 ("Claim-vs-shipped audit") to the Single-Pass Completeness Mandate in pr_review.md and its local-mode substitution in openai_review.py. Actively cross-references REGISTRY / CHANGELOG / PR-body claims against implementation, tests, public docstrings, rendering surfaces, and cross-doc consistency, flagging absences per the deferral rule. Required because the original prompt let reviewers enumerate what exists without auditing what was claimed but missing. - Tighten assessment criteria so unmitigated P2 findings produce a "Needs changes" verdict (not "Looks good"), with explicit "must enumerate" wording preventing silent skips and a targeted carve-out preventing TODO.md from laundering shipped-behavior test gaps to P3. Mitigation via REGISTRY.md Notes or pre-existing TODO entries remains available for non-shipped-claim P2s. The script's _SUBSTITUTIONS list is split into _LOCAL_FRAMING_SUBSTITUTIONS (9 PR -> code-change tuples, applied only in local mode) and _MANDATE_SUBSTITUTIONS (1 tuple, applied to all single-shot uses since neither has tool access). New CLI flags --ci-mode, --pr-title, --pr-body plumb PR context into a "## PR Context" prompt section in CI mode. PR body wrapped in <pr-body untrusted="true">...</pr-body> with a case/whitespace-tolerant regex stripping any literal closing tags. Pre-merge verification on PR igerber#412 SHA 7f7e3d5 with the modified script (--ci-mode --full-registry --context standard --model gpt-5.5): verdict "Needs changes"; surfaces all 3 of the smoking-gun P2s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HanomicsIMF
pushed a commit
to HanomicsIMF/diff-diff
that referenced
this pull request
May 22, 2026
…GELOG H1 scope - R6 fix left METHODOLOGY_REVIEW.md Deviations item igerber#6 stale (only updated the Verified-Components row). Item igerber#6 still said "Eq. 18 linear-trend- detrended joint Stute deferred". Rewritten to match the rest of the HAD tracker: trends_lin=True is SHIPPED + R-parity-locked in test_did_had_parity.py; the methodology-walkthrough file deliberately doesn't duplicate that coverage; the Pierce-Schott published-value numerical replication is what's waived (Deviations Note igerber#3). - R6 narrowed the Verified-Components row + class docstring but missed the CHANGELOG bullet, which still claimed "joint Stute pre-trends + homogeneity H0 fail-to-reject + H1 reject under nonlinear DGP". Narrowed to: "H0 fail-to-reject on both surfaces and H1 reject for joint homogeneity under a nonlinear DGP" — matches the test file's actual scope. All 35 methodology tests pass; lint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HanomicsIMF
pushed a commit
to HanomicsIMF/diff-diff
that referenced
this pull request
May 22, 2026
Flip the ChaisemartinDHaultfoeuille (DCDH) row from In Progress to Complete. Adds the Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns detail section mirroring the ContinuousDiD (PR igerber#476) and HAD (PR igerber#473) precedents. Consolidates 7 DCDH deviations from the paper, from R DIDmultiplegtDYN, and library extensions into a labeled REGISTRY surface per the AI-review "Documenting Deviations" convention. CHANGELOG [Unreleased] gains a new Added entry. L27 In Progress example re-pointed to WooldridgeDiD; L1289 priority-order queue item igerber#6 removed and items igerber#7-igerber#11 renumbered to igerber#6-igerber#10. No source code changes, no new tests, no new docstrings — documentation consolidation only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extend the library to optionally work across multiple time periods: